AITopics | inflected form

Collaborating Authors

inflected form

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Simple Joint Model for Improved Contextual Neural Lemmatization

Malaviya, Chaitanya, Wu, Shijie, Cotterell, Ryan

arXiv.org Artificial IntelligenceMay-28-2024

English verbs have multiple forms. For instance, talk may also appear as talks, talked or talking, depending on the context. The NLP task of lemmatization seeks to map these diverse forms back to a canonical one, known as the lemma. We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages from the Universal Dependencies corpora. Our paper describes the model in addition to training and decoding procedures. Error analysis indicates that joint morphological tagging and lemmatization is especially Figure 1: Our structured neural model shown as a hybrid helpful in low-resource lemmatization and languages (directed-undirected) graphical model (Koller and that display a larger degree of morphological Friedman, 2009).

computational linguistic, lemmatization, linguistic, (14 more...)

arXiv.org Artificial Intelligence

1904.02306

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

J-UniMorph: Japanese Morphological Annotation through the Universal Feature Schema

Matsuzaki, Kosuke, Taniguchi, Masaya, Inui, Kentaro, Sakaguchi, Keisuke

arXiv.org Artificial IntelligenceFeb-22-2024

We introduce a Japanese Morphology dataset, J-UniMorph, developed based on the UniMorph feature schema. This dataset addresses the unique and rich verb forms characteristic of the language's agglutinative nature. J-UniMorph distinguishes itself from the existing Japanese subset of UniMorph, which is automatically extracted from Wiktionary. On average, the Wiktionary Edition features around 12 inflected forms for each word and is primarily dominated by denominal verbs (i.e., [noun] +suru (do-PRS)). Morphologically, this form is equivalent to the verb suru (do). In contrast, J-UniMorph explores a much broader and more frequently used range of verb forms, offering 118 inflected forms for each word on average. It includes honorifics, a range of politeness levels, and other linguistic nuances, emphasizing the distinctive characteristics of the Japanese language. This paper presents detailed statistics and characteristics of J-UniMorph, comparing it with the Wiktionary Edition. We release J-UniMorph and its interactive visualizer publicly available, aiming to support cross-linguistic research and various applications.

expression, inflected form, verb, (14 more...)

arXiv.org Artificial Intelligence

2402.14411

Country:

Asia > Japan > Honshū > Tōhoku (0.05)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(6 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Lexicon and Rule-based Word Lemmatization Approach for the Somali Language

Mohamed, Shafie Abdi, Mohamed, Muhidin Abdullahi

arXiv.org Artificial IntelligenceAug-3-2023

The lemmatization summary statistics of the Example 3 sentence are also provided in Table 1. In this case, the percentage of words that were normalized for the example reached 100%, which means that all content words (excluding stop words and special characters) are lemmatized. This may be due to the fact that this is a short document, a sentence of 8 words. Unlike the lemmatization statistics of this example, a proportion of words in any typical text document (i.e., longer than a sentence) will normally remain unresolved - words that the algorithm fails to lemmatize in both stages. Overall and as part of evaluating the proposed method, we have tested the algorithm on 120 documents of various lengths including general news articles, and social media posts. For the news articles, we have used extracts (i.e., title and first 1-2 paragraphs) as well as the full articles to see the effect of document length. The results we found for these different document categories are summarized in Table 2. The notations #Docs, Avg Doc Len, and Avg Acc. in the table respectively represent the number of documents, average document length in words, and average lemmatization accuracy. As shown, the results demonstrate that the algorithm achieves a relatively good accuracy of 57% for moderately long documents (e.g.

lemmatization, lexicon, root word, (14 more...)

arXiv.org Artificial Intelligence

2308.01785

Country:

North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
Africa > Middle East > Somalia > Banaadir > Mogadishu (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A big data approach towards sarcasm detection in Russian

Gurin, A. A., Sadykov, T. M., Zhukov, T. A.

arXiv.org Artificial IntelligenceJun-1-2023

We present a set of deterministic algorithms for Russian inflection and automated text synthesis. These algorithms are implemented in a publicly available web-service www.passare.ru. This service provides functions for inflection of single words, word matching and synthesis of grammatically correct Russian text. Selected code and datasets are available at https://github.com/passare-ru/PassareFunctions/ Performance of the inflectional functions has been tested against the annotated corpus of Russian language OpenCorpora, compared with that of other solutions, and used for estimating the morphological variability and complexity of different parts of speech in Russian.

inflection, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.00445

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Social Media (0.95)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Advancing Full-Text Search Lemmatization Techniques with Paradigm Retrieval from OpenCorpora

Kalugin-Balashov, Dmitriy

arXiv.org Artificial IntelligenceMay-18-2023

In full-text search applications, the primary goal is to effectively retrieve and match relevant documents based on user queries. By focusing on finding the first form, or the lemma, of a word, the search process can be streamlined and optimized. The lemma serves as a normalized representation of a word's different inflected forms, allowing for a more accurate comparison between user queries and document content. This approach reduces the complexity and computational overhead associated with full morphological analysis, which includes extracting all possible forms of a word along with their grammatical properties. By prioritizing lemma retrieval, full-text search engines can achieve faster response times and more precise results, while minimizing the resources required for processing large volumes of text data. Consequently, building upon the foundation of pymorphy[1], the golemma library was developed to address the challenge of efficiently identifying the first form, or lemma, of words in the Russian language.

information retrieval, natural language, paradigm, (14 more...)

arXiv.org Artificial Intelligence

2305.10848

Genre: Research Report (0.40)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.30)

Add feedback

K-UniMorph: Korean Universal Morphology and its Feature Schema

Jo, Eunkyul Leah, Kim, Kyuwon, Wu, Xihan, Lim, KyungTae, Park, Jungyeul, Park, Chulwoo

arXiv.org Artificial IntelligenceMay-17-2023

We present in this work a new Universal Morphology dataset for Korean. Previously, the Korean language has been underrepresented in the field of morphological paradigms amongst hundreds of diverse world languages. Hence, we propose this Universal Morphological paradigms for the Korean language that preserve its distinct characteristics. For our K-UniMorph dataset, we outline each grammatical criterion in detail for the verbal endings, clarify how to extract inflected forms, and demonstrate how we generate the morphological schemata. This dataset adopts morphological feature schema from Sylak-Glassman et al. (2015) and Sylak-Glassman (2016) for the Korean language as we extract inflected verb forms from the Sejong morphologically analyzed corpus that is one of the largest annotated corpora for Korean. During the data creation, our methodology also includes investigating the correctness of the conversion from the Sejong corpus. Furthermore, we carry out the inflection task using three different Korean word forms: letters, syllables and morphemes. Finally, we discuss and describe future perspectives on Korean morphological paradigms and the dataset.

artificial intelligence, ending, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.06335

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France > Île-de-France > Paris > Paris (0.06)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
(16 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Inflected Forms Are Redundant in Question Generation Models

Sun, Xingwu, Tang, Hongyin, Xu, chengzhong

arXiv.org Artificial IntelligenceJan-1-2023

Neural models with an encoder-decoder framework provide a feasible solution to Question Generation (QG). However, after analyzing the model vocabulary we find that current models (both RNN-based and pre-training based) have more than 23\% inflected forms. As a result, the encoder will generate separate embeddings for the inflected forms, leading to a waste of training data and parameters. Even worse, in decoding these models are vulnerable to irrelevant noise and they suffer from high computational costs. In this paper, we propose an approach to enhance the performance of QG by fusing word transformation. Firstly, we identify the inflected forms of words from the input of encoder, and replace them with the root words, letting the encoder pay more attention to the repetitive root words. Secondly, we propose to adapt QG as a combination of the following actions in the encode-decoder framework: generating a question word, copying a word from the source sequence or generating a word transformation type. Such extension can greatly decrease the size of predicted words in the decoder as well as noise. We apply our approach to a typical RNN-based model and \textsc{UniLM} to get the improved versions. We conduct extensive experiments on SQuAD and MS MARCO datasets. The experimental results show that the improved versions can significantly outperform the corresponding baselines in terms of BLEU, ROUGE-L and METEOR as well as time cost.

machine learning, natural language, question generation model, (4 more...)

arXiv.org Artificial Intelligence

2301.00397

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Modelling Morphological Features

#artificialintelligenceNov-9-2020, 11:30:08 GMT

This is an excerpt from my master thesis titled: "Semi-supervised morphological reinflection using rectified random variables" Languages use suffixes and prefixes to convey context, stress, intonation, and grammatical meaning (like subject-verb agreement). Such suffixes and prefixes form a more general class of entities which are the meaningful sub-parts of a word; these are called as morphemes. A language's morphology refers to the rules and processes through which morphemes are combined; this allows a word to express its syntactic categories and semantic meaning. For example, in English, a verb can have three tenses: past, present, and future. These are the inflected forms' of the verb.

bernoulli distribution, inflected form, morphological feature, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Inflection system of a language as a complex network

Fukś, Henryk

arXiv.org Artificial IntelligenceJul-6-2010

We investigate inflection structure of a synthetic language using Latin as an example. We construct a bipartite graph in which one group of vertices correspond to dictionary headwords and the other group to inflected forms encountered in a given text. Each inflected form is connected to its corresponding headword, which in some cases in non-unique. The resulting sparse graph decomposes into a large number of connected components, to be called word groups. We then show how the concept of the word group can be used to construct coverage curves of selected Latin texts. We also investigate a version of the inflection graph in which all theoretically possible inflected forms are included. Distribution of sizes of connected components of this graphs resembles cluster distribution in a lattice percolation near the critical point.

graph, headword, inflected form, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIC-STH.2009.5444449

1007.1025

Country:

North America > United States > New York > Suffolk County > Hauppauge (0.04)
North America > Canada > Ontario > Niagara Region > St. Catharines (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.69)

Add feedback